From the World Wide Web to Digital Library Stacks: Preserving the French Web Archives

نویسندگان

  • Clément Oury
  • Sébastien Peyrard
چکیده

The National Library of France is mandated by French law to collect and preserve the French Internet. It is now a 10-year old project with collections ranging from 1996 to the present. To ensure their long-term preservation, the choice has been made to ingest these web archives into the institution’s existing digital preservation repository, SPAR (Scalable Preservation and Archiving Repository). There were numerous implementation challenges, on the modeling as well as the technical sides, which the library met with solutions drawn from international collaboration and widely adopted standards, whenever possible. – Web archive-specific formats (W/ARC files) lacked validation and characterization tools, which led to the development of a Jhove2 module for the ARC format. – The heterogeneity of BnF’s web archives in terms of formats, production workflows and tools, was managed by aligning all of them on a single model, the current production workflow using NetarchiveSuite. – The specificities of web archives were matched to the PREMIS data model and dictionary and SPAR’s global METS profile. – Finally, the need to express technical information about ARC files in a concise, manageable fashion led us to define a format-specific metadata scheme for container files, containerMD, which will be released to the preservation community (on BnF’s website). All this development work means new services for digital curators in general and preservation experts in particular. They will be able to know their collection better, to check its comprehensiveness, and, with that deeper understanding, to investigate new preservation strategies. Allowing differentiated service level agreements for specific sets of documents, with richer metadata extraction, better quality insurance and differentiated preservation strategies, will be the logical next step of the web archives longterm preservation project.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Acquiring and providing access to historical web collections

Every day, unique valuable information that describes our current days disappears from the web. National archives or libraries have been keeping cultural heritage for centuries by collecting and preserving past generation objects or printed media. Now, it is mandatory to preserve digital cultural heritage in the form of web content. The Portuguese Web Archive project began in 2008. Since then, ...

متن کامل

Towards an Ontology for Describing Archival Resources

Several digital libraries and archives are emerging around the world due to the need to store, organize and make available on the Web a lot of resource collections. However, managing this information poses new challenges in order to overcome traditional data management and information browsing. Semantic Web technologies can improve digital libraries and archives by facilitating metadata storage...

متن کامل

The World Wide Web

The World Wide Web is a very large distributed digital information space. From its origins in 1991 as an organization-wide collaborative environment at CERN for sharing research documents in nuclear physics, the Web has grown to encompass diverse information resources: personal home pages; online digital libraries; virtual museums; product and service catalogs; government information for public...

متن کامل

Migrating Content in WARC Files

Heritage institutions all over the world started on harvesting and preserving resources of the World Wide Web for future generations as part of our culture heritage. This task tends to be a non-trivial one because of two complex challenges: (1) crawling the enormous data amount located in the Internet and (2) performing long term preservation strategies on these data. Nowadays a lot of effort i...

متن کامل

Digital Imaging for Archival Preservation and Online Presentation: Best Practices

Executive Summary The following information aims to provide a general overview of digital imaging, specifically the presentation of visual images on the World Wide Web and the digital conversion of records for the purposes of Archival preservation. In terms of digital imaging for Libraries, Archives and Museums, web-access is obviously not the only major issue to be dealt with. The burgeoning f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011